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1  Introduction 

There  are  two  problems  in  steganalysis:  (1)  detecting  the  existence  of  a  hidden  message 
and  (2)  decoding  the  message.  As  terrorist  groups  have  been  known  to  use  steganogra- 
phy  in  planning  their  attacks,  this  has  become  an  important  problem  of  national  security. 
This  research  proposal  is  only  concerned  with  the  first  problem  of  hidden  message  de¬ 
tection  using  steganalysis  [8]. 

The  approach  is  to  statistically  analyze  the  least  significant  bit(s)  of  each  color  di¬ 
mension  of  each  pixel  to  look  for  some  kind  of  a  pattern.  In  the  absence  of  a  hidden 
message  this  should  look  like  random  noise.  Addition  of  a  hidden  message  will  affect 
the  entropy  of  the  data  [4].  This  difference  should  be  detectable  by  comparing  the  en¬ 
tropy  of  unaltered  picture  files  with  the  entropy  of  files  with  embedded  steganography. 
For  this  research  freely  available  software  for  embedding  hidden  messages  will  be  used 
to  create  sample  image  files  to  analyze.  For  the  statistical  analysis  R  Language  will  be 
used. 

The  following  steps  will  be  followed  in  the  work  plan: 

•  Obtain  sample  jpegs  from  the  Internet  or  other  source 

•  Import  these  sample  files  as  data  files  into  R  Language 

•  Statistically  analyze  least  significant  bits. 

•  Use  steganography  to  hide  messages  in  a  sample  of  jpeg  files. 

•  Import  as  a  data  file  into  R  language  and  statistically  analyze  the  least  significant 
bits  of  the  jpeg  files  with  known  hidden  messages. 

•  Compare  with  original  file  in  terms  of  entropy. 

2  Least  Significant  Bit 

One  of  the  advantages  of  Steganography  alone,  is  the  fact  that  the  message  does  not  at¬ 
tract  attention  to  itself.  No  party  should  have  knowledge  of  the  existence  of  a  message, 
but  the  sender  and  the  recipient.  Once  the  message  has  been  compromised,  steganogra¬ 
phy  simply  fails  [9] . 

2.1  Image  definition 

Historically,  many  strategies  has  been  used  to  hide  messages  sent  between  two  or  more 
parties.  Currently,  one  of  the  means  to  hide  a  message  are  images.  The  mechanism 
is  to  embed  a  message  digitally  in  a  picture  by  manipulating  the  bits  representing  the 
different  colors  [9].  Let  us  assume  that  a  picture  is  depicted  in  the  different  scales  of 
gray,  including  black  and  white  (monochrome  and  grayscale  images).  Each  pixel  is 
represented  by  a  string  of  8  bits.  From  black  to  white,  we  have  28  =  256  different 
tones  of  gray.  These  range  from  the  representation  of  white,  00000000,  through  black, 
11111111.  A  given  message  with  proper  size  can  be  embedded  in  a  cover  (image)  by 
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manipulating  the  bits  on  each  pixel.  Let  us  assume  that  in  a  particular  pixel,  the  gray  is 
represented  by  OOOOllll.  By  switching  the  second  bit  we  obtain  a  new  binary  string, 
01001111.  The  latter  change  has  modified  the  original  picture. 

For  the  case  of  24  bits  images,  digital  colour  pictures  (RGC  colour  model  pictures), 
each  color  red,  green  and  blue  is  represented  by  a  string  of  8  bits,  so  that  each  pixel  can 
be  colored  by  manipulating  each  color  (each  string  of  bits)  [9].  In  each  pixel  there  can 
be  256  representations  for  each  color,  so  that  there  are  2563  =  16,  777,  216  possibilities 
of  color  shades.  Below,  three  sets  of  8  bits-string  are  defined.  Each  set  represents  a 
color.  From  an  original  image,  say 

00001010  00110101  00011110 

we  can  change  the  Ath  bit  from  left  to  right,  obtaining 

00011010  00100101  00001110 

2.2  Image  Compression 

When  working  with  large  images,  we  start  having  problems  handling  large  files.  Some 
sort  of  compression  is  necessary  in  order  to  better  handle  these  images.  There  are  two 
types  of  compression:  lossy  and  lossless  [9].  An  example  of  the  first  type  of  com¬ 
pression  technique  is  JPEG  (Joint  Photographic  Experts  Group)  image  format.  For  the 
second  type,  we  have  the  GIF  (Graphical  Interchange  Format)  and  the  8-bit  BMP  (Mi¬ 
crosoft  Windows  Bitmap  file).  In  the  first  case  loss  of  information  occurs,  while  in 
the  second  the  integrity  of  the  original  information  remains  intact.  The  technique  of 
steganography  implemented  will  depend  on  the  compression  technique  used  [9]. 

2.3  Least  Significant  Bit 

The  object  of  steganography  is  to  prevent  suspicion  upon  the  existence  of  a  message, 
regardless  of  the  mean  used.  Small  changes  in  the  tone  of  gray  will  be  imperceptible 
to  the  human  eye.  The  Feast  Significant  Bit  (FSB)  is  a  simple  approach  to  modify  an 
image,  while  at  the  same  time,  making  the  change  imperceptible  to  the  human  eye.  By 
considering  the  redundant  bits  (least  significant  bits),  imperceptible  changes  take  place 
by  changing  the  8th  bit  in  the  string  of  eight  bits.  For  example,  by  changing  00001  111 
to  00001 1 10,  we  have  applied  the  least  significant  bit  technique. 

2.4  Significant  Bit  Image  Depiction 

Steganography  fails  to  comply  in  its  purpose,  at  the  very  moment  when  the  existence 
of  the  message  has  been  compromised.  Even  when  steganography  is  not  infallible,  its 
strength  lies  entirely  on  the  non-knowledgeable  of  its  containment,  whatever  it  is.  When 
the  mean  of  communication  is  a  picture,  from  which  a  text  or  a  message  can  be  extracted, 
its  infallibility  is  directly  related  to  the  manipulation  of  the  pixels.  In  particular,  by 
manipulating  the  FSB,  any  message  is  safe  as  long  as  it  remains  imperceptible  to  the 
human  eye.  The  following  pictures  show  the  level  of  visual  perception  in  relation  to 
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the  change  of  bits  of  each  channel  (color)  in  each  pixel.  An  image  is  compared  before 
and  after  the  LSB  technique  has  been  applied.  A  simple  procedure  (switching  all  LSB 
for  each  channel  in  each  pixel)  has  produced  a  different  image  that  simply  cannot  be 
distinguished  from  the  original  one. 


Figure  1:  Original  Depiction  (left);  LSB  Depiction  (right) 


We  are  unable  to  perceive  any  changes  in  the  image  after  the  LSB  technique. 

Below,  the  resultant  images  after  switching  bits  number  2  and  3,  start  to  show  gradual 
changes  in  color  shades. 


Changes  made  from  the  bits  number  4  and  on  are  definitely  evident.  Colors  are  dis¬ 
torted  and  degraded. 


Finally,  the  switch  of  bit  number  8  in  all  channels  for  every  pixel  makes  the  modifi¬ 
cation  evident.  So  it  is  that  it  can  be  perceived  the  by  human  eye.  These  bits  (number 
8)  are  extremely  significant,  and  if  steganography  is  the  intended  purpose,  would  be 
unwise  to  choose  bit  number  8.  Below,  we  show  the  original  image  side  by  side  with 
the  8th  -bit-switch  depiction. 
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Figure  4:  Switch  of  bit  number  6  (left);  Switch  of  bit  number  7  (right) 


Figure  5:  Original  depiction  (left);  Switch  of  bit  number  8  (right) 


2.5  R-Codes 

In  the  previous  section  we  compared  an  original  image  with  different  modifications,  by 
having  a  bit  switched  over.  We  have  managed  to  accomplish  this  using  the  R-language 
(2.14,2.15).  We  have  loaded  packages  jpeg,  Readlmage,  boolfun,  RGraphics  and  many 
others.  The  two  crucial  packages  were  Readlmages  and  boolfun  to  use  read.jpeg  and 
toBin,  respectively.  The  first  allows  us  to  read  an  image  in  jpg  or  jpeg  format.  The 
second  one  gives  an  integer  binary  representation.  Another  function  used  from  boolfun 
is  tolnt,  which  returns  an  integer  from  a  given  binary  representation. 

The  function  below,  imageManipulation2,  switches  the  ith  bit  for  each  channel  in 
every  pixel  of  an  image  in  jpeg  or  jpg  format.  When  the  picture  is  read,  a  3-dimensional 
array  is  built.  Each  array  contains  the  numeric  representation  for  each  color  in  every 
pixel.  The  values  in  this  matrix  are  real  numbers  between  0  and  1.  These  numbers  are 
normalized  and  they  are  of  the  form  nj 255,  where  n  —  0, 1, ...,  255.  These  number  are 
multiplied  by  255  resulting  in  an  integer  ranging  from  0  to  255,  precisely  256  values. 
The  integers  are  converted  to  their  binary  representation.  A  particular  bit  is  switched 
over  for  each  binary.  This  sequence  is  converted  to  an  integer  and  normalized.  The 
resulting  matrix  is  an  image  different  from  the  original. 

%x  is  the  image 

%This  function  switches  the  bit  at  position  n 
%N  is  the  length  of  the  binary  representation 
%t  is  a  title 

%normalization  (division  by  255) 

function  (x,n,N,t) 

{ 

nrow  <-  dim(x)[l] 
ncol  <-  dim(x)[2] 
nchac-  dim(x)[3] 
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for(i  in  l:ncha) 

{ 

for(j  in  l:nrow) 

{ 

for(k  in  l:ncol) 

{ 

if(n<=N  &&  n>=l) 

{ 

m  =  x[j,k,i]*255 

%toBin  returns  a  binary  representation  of  length  N 
y  =  toBin(m,N) 
y[n]  =  !y[n] 

%toInt  returns  an  integer  from  a  binary  representation  of  a  certain  length 
x[j,k,i]  =  toInt(y)/255 

} 

} 

} 

} 

plot(x,main=t) 

x 

} 

3  Entropy 

Great  part  of  the  development  of  the  Mathematical  Theory  of  Communications  during 
the  sixties  is  due  to  Claude  Shannon,  a  Bell  Labs  Mathematician.  Shannon  is  one  of  the 
founders  of  the  Information  Age.  Shannon  made  clear  that  uncertainty  is  the  marketable 
item  produced  to  satisfy  wants  or  needs  in  a  society  of  communication. 

The  amount  of  information  or  uncertainty,  output  by  an  information  source  is  a  mea¬ 
sure  of  its  entropy.  The  entropy  of  an  information  source  S,  according  to  Shannon  is 
defined  as 

H(S)  =  Y^Pilog(l/pi),  (1) 

i 

where  p%  is  the  probability  that  Si  in  S  will  take  place.  The  factor  log(l/pi)  indicates 
the  amount  of  information  contained  in  Si,  i.e.,  the  number  of  bits  needed  to  code  Si. 

For  example,  in  an  image  with  uniform  distribution  of  gray-level  intensity,  i.e.,  pt  = 
1/256,  then  the  number  of  bits  needed  to  code  each  gray  level  is  8  bits.  The  entropy  of 
this  image  is  8. 

4  Steganalysis 

There  are  many  approaches  to  detect  steganographic  images  by  using  statistical  tools. 

In  this  section  we  are  going  to  mention  some  of  them  briefly.  There  are  many  tests 
that  can  be  done  upon  certain  statistical  properties.  These  tests  go  from  very  simple  to 
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more  complex  and  sophisticated  [8].  Westfeld  and  Pfitzmann  observed  that  embedding 
encrypted  data  into  a  GIF  image  changes  the  histogram  of  its  color  frequencies  [8]. 
One  property  of  binary  data,  in  the  context  of  encryption,  is  that  zeros  and  ones  are 
equally  likely.  The  LSB  technique  has  the  property  that  in  an  image  with  an  embedded 
encrypted  data,  if  one  color  A,  happens  more  often  than  color  B,  color  A  is  changed 
more  often  than  color  B,  rather  than  the  other  way  around.  This  results  in  a  reduction  of 
color  frequency  between  colors  A  and  B,  because  of  the  embedding.  A  simple  diagram 
(histogram)  can  depict  these  differences,  and  visually  identified. 

Another  approach  taken  in  measuring  statistical  properties  is  by  analyzing  the  fre¬ 
quency  of  the  DCT  (Discrete  Cosine  Transformation)  coefficients.  By  comparing  the 
empirical  distribution  and  a  theoretic  distribution,  a  y2  test  can  be  used.  The  latter  is  a 
traditional  view  in  using  statistical  theories.  However,  we  must  mention  briefly  that  by 
analogy,  the  same  statistical  tools  used  in  the  frequentist  approach  have  their  counter¬ 
parts  in  the  Bayesian  statistics  [1]. 

Now,  if  we  consider,  without  further  pretensions  the  bits-changes  in  a  sequence  mode, 
other  statistical  tools  might  be  useful.  Such  an  interpretation  might  appeal  to  Time 
Series  [3]  and  from  a  Bayesian  point  of  view,  to  Dinamical  System. 

5  Steganographic  Systems  and  Detection 
Frameworks 

Many  steganographic  systems  can  be  used  for  embedding.  From  these  systems,  we 
can  mention  JSteg,  JSteg-Shell,  JPHide  and  OutGuess.  Most  of  them  work  around  the 
concept  of  DCT  coefficients.  As  a  detector  counterpart,  we  can  mention  Stegdetect. 

6  R-Language 

Several  statistical  tools  for  image  processing  are  available  of  all  sort.  Packages  have 
been  developed  to  process,  manipulate  and  analyze  images  in  many  languages,  such  as 
C,  C++,  Matlab,  R,  etc..  Particularly,  the  latter  provides  an  environment  where  many 
packages  for  image  processing  and  binary  manipulation  can  be  used.  From  these  pack¬ 
ages  we  can  mention  DICOM,  ANALYZE,  NIFTI,  Readlmages,  RGraphics,  jpeg,  bmp, 
png,  boolfun,  caTools,  bindata,  bit,  boolean,  biOps,  biOpsGUI  and  pixmap. 


7  Conclusion 

There  are  several  steps  to  take  from  here.  First,  in  order  to  better  understand  and  com¬ 
prehend  the  theories  discussed  (briefly),  we  must  reproduce  some  of  the  results  pro¬ 
vided  by  the  articles.  Second,  the  use  of  statistical  software  tools  are  rather  known.  We 
have  identified  the  R-language  for  implementation.  Third,  we  must  select  a  picture  for 
processing,  bits  manipulation,  color  and  DCT  coefficients  frequency  and  distribution. 
Fourth,  use  one  of  the  software  packages  for  steganography  or  have  one  developed  in 
a  simply  and  rudimentary  manner  (as  an  option).  Fifth,  determine  the  entropy  of  the 
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picture  (before  and  after  embedding).  Sixth,  have  the  picture  submitted  to  statistical 
analysis,  steganalysis.  Seventh,  develop  or  identify  a  mechanism  to  analyze  in  mass, 
pictures  from  the  Internet  or  other  source  provided. 

The  project  is  in  an  on  going  status. 
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